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Abstract — This paper investigates trellis structures of linear 
block codes for the integrated circuit (IC) implementation of 
Viterbi decoders capable of achieving high decoding speed while 
satisfying a constraint on the structural complexity of the trellis 
in terms of the maximum number of states at any particular 
depth. Only uniform secdonalizations of the code trellis diagram 
are considered. An upper-bound on the number of parallel and 
structurally identical (or isomorphic) subtrellises in a proper 
trellis for a code without exceeding the maximum state complex- 
ity of the minimal trellis of the code is first derived. Parallel 
structures of trellises with various section lengths for binary 
BCH and Reed-Muller (RM) codes of lengths 32 and 64 are 
analyzed. Next, the complexity of IC implementation of a Viterbi 
decoder based on an Z -section trellis diagram for a code is 
investigated. A structural property of a Viterbi decoder called 
add-compare-select (ACS)-connectivity which is related to state 
connectivity is introduced. This parameter affects the complexity 
of wire-routing (interconnections within the IC). The effect of 
five parameters namely: 1) effective computational complexity; 2) 
complexity of the ACS-circuit; 3) traceback complexity; 4) ACS- 
connectivity; and 5) branch complexity of a trellis diagram on 
the very large scale integration (VLSI) complexity of a Viterbi 
decoder is investigated. It is shown that an IC implementation of 
a Viterbi decoder based on a nonminimal trellis requires less area 
and is capable of operation at higher speed than one based on the 
minimal trellis when the commonly used ACS-array architecture 
is considered. 

Index Terms — ACS-array architecture, trellis diagram, Viterbi 
decoder. 


I. Introduction 

A NY linear block code can theoretically be decoded by 
applying the Viterbi algorithm to a trellis for the code. 
Trellises for block codes were first described in [1 ]— [3]. After 
Forney’s refinement of the structure of these trellises [4], their 
potential in the practical decoding of block codes has been 
realized by many others who have published extensively on 
various aspects of the trellis structure of block codes [5]— [25]. 
In some of the above papers, one goal was to minimize the 
maximum number of states in the trellis at any depth by 
considering all possible permutations of the code [6]. For 
some codes such as Reed-Muller (RM) codes, this optimum 
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permutation is known [7]. For most others only bounds are 
known. 

Even when the optimum order of bits is known or a good 
permutation is known (if the optimum order is unknown), 
previous work has focussed on minimization of the number 
of computations required for decoding [12], [22], [24]. If the 
actual decoding is intended to be performed using a stored pro- 
gram approach that executes the operations needed to decode a 
received vector sequentially, then this approach will lead to the 
fastest decoding speed. However, if an integrated circuit (IC) 
implementation is intended, then an alternative approach is 
more suitable. Given a constraint on the amount of hardware 
(determined by the number of states and the complexity of 
branches) in the decoder, decoding must be done as fast as 
possible; not necessarily with as few computations as possible. 
To achieve this end, we propose the use of nonminimal 
trellises with parallel structure in which the maximum state 
space dimension is not greater than the maximum state space 
dimension of the minimal trellis of a code. In this paper, 
certain properties concerning the state connectivity and branch 
complexity [9] of this nonminimal trellis are derived which 
demonstrate that the nonminimal trellis implementation would 
require less area in an 1C implementation than the correspond- 
ing minimal trellis when the ubiquitous add-compare- select 
(ACS) array architecture [26]— [28] is used for implementation. 
We caution that if a different architecture as proposed in [27] or 
[24] is chosen for implementation, then the trellis structure that 
is best suited will in general be different from the proposed 
trellis. 

The number of decoding operations required by the stan- 
dard trellis-based Viterbi decoding algorithm depends on the 
sectionalization of the trellis used for decoding. Most of the 
previous works focussed on uniform sectionalization of a 
trellis, each section consists of the same number of code 
symbols. However, [22] recently showed that nonuniform 
sectionalization of a trellis often results in less number of 
decoding operations than uniform sectionalization. They have 
devised an efficient algorithm for finding optimal section- 
alization of a trellis for minimizing the total number of 
decoding operations required for maximum-likelihood (ML) 
trellis decoding. Optimal sectionalization of a trellis to mini- 
mize computational complexity is also investigated in [24]. In 
this paper, we only investigate good trellises with uniform 
sectionalization for IC implementation of Viterbi decoders. 
Particularly, we are concerned with those structures, such 
as parallel structure, regularity and state-connectivity that: 
1) affects the complexity of wire-routing (interconnections) 
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within the IC and chip-size and 2) facilitate parallel and 
pipeline decoding process to achieve high decoding speed. 
Since nonuniform sectionalization of a trellis requires less 
decoding operations, this advantage over uniform sectional- 
ization and other properties definitely should be investigated 
for IC implementation of Viterbi decoders to achieve high 
decoding speed. This investigation is beyond the scope of this 
paper. 

Trellises for block codes are often loosely connected. A 
properly constructed trellis may consist of many parallel and 
structurally identical (isomorphic) subtrellises of smaller state 
space dimension without cross-connections between them. 
Consequently, identical Viterbi decoders of much smaller com- 
plexity can be devised to process the subtrellises independently 
in parallel without internal communication between them. This 
not only simplifies the IC implementation but also speeds 
up the decoding process. For example, the (32,16,8) RM 
code, also an extended BCH code, has a four-section, 64- 
state minimal trellis diagram, which consists of eight parallel 
and structurally identical eight-state subtrellises without cross- 
connections among them. As a result, we can devise eight 
identical eight-state Viterbi decoders to process the eight 
subtrellises in parallel without communication between them. 
At the end, there are eight survivors (one from each subtrellis) 
and the best one will be chosen as the decoded codeword. 
This reduces the implementation of a 64-state Viterbi decoder 
to the implementation of an eight-state decoder and using 
eight copies of it. This parallel structure reduces the wire- 
routing and internal communications within IC which reduces 
chip size and improves decoding speed. If the state and 
branch complexities of each subtrellis is small and the total 
number of subtrellises is small, all the subtrellis decoders 
can be put on a single chip, such as for the (32,16,8) RM 
code [29]. However, if the state and branch complexities are 
big, then each subtrellis decoder (or several of them) can be 
•implemented on a single chip. This provides flexibility in chip 
plan and decoder architecture. 

The two fundamental bottlenecks to Viterbi decoding (de- 
coding speed) are the internal communications between ACS 
units and comparisons of incoming branches (radix-profile) at 
each state [28], [30]. Properly designed parallel structure in a 
trellis would overcome these obstacles without exceeding the 
maximum state space dimension of the minimal trellis. For 
example, a (64,40,8) RM subcode which is being considered 
by NASA for high-speed satellite communications has an 
eight-section 2048-state trellis. This trellis consists of 32 
parallel and structurally identical 64-state subtrellises. The last 
four sections of each subtrellis are a mirror image of the 
first four sections as shown in Fig. 3. As a result, a bidirec- 
tional decoding can be performed. Furthermore, the maximum 
component of the radix profile for each half subtrellis is 
only eight. A 64-state subtrellis decoder can be implemented 
on a single chip in 0.5 pm complementary metal-oxide- 
semiconductor (CMOS) technology which can operate at a 
decoding speed of 600 Mps [31]. Other structural properties of 
the subtrellises for this (64,40,8) RM subcode which simplifies 
the IC implementation will be discussed later. Parallel structure 
therefore, offers simplification, flexibility and higher decoding 


speed for IC implementation. We must note that the parallel 
structure does not reduce the total number of single-state 
processors, i.e., number of ACS’s. 

In this paper, we investigate trellis structures, particularly 
the parallel structure, of linear block codes for implementation 
of Viterbi decoders capable of achieving high decoding speed 
while satisfying a constraint on the structural complexity of 
the trellis in terms of the maximum number of states at any 
depth. Only uniform sectionalizations of the code trellis are 
considered. The organization of the paper is as follows. 

In Section II, using the theory of L-section minimal trellis 
diagrams, an upper-bound on the number of parallel iso- 
morphic substrellises in a proper trellis for a code without 
exceeding the maximum state space dimension of the minimal 
trellis of the code is derived. In Section HI, we analyze 
the trellises for all extended BCH and RM codes of lengths 
32 and 64. In Section IV, we define parameters related to 
the complexity of a Viterbi decoder IC using the ACS-array 
architecture for linear block codes. Section V treats examples 
and in Section VI we use the results of this paper to design a 
trellis for a (64,40) RM subcode. 

n. Trellises with Parallel Structure 
for Linear Block Codes with Constraint 
on Maximum State Space Dimension 

The objective of this section is to show that we can build a 
trellis for a linear block code C which is a disjoint union of 
a certain desired number of parallel isomorphic subtrellises. 
Although this trellis is not minimal, its state space dimension 
at every depth is less than or equal to the maximum state 
space dimension of the minimal trellis. The conditions under 
which such a trellis construction is possible and an upper- 
bound on the number of such parallel subtrellises are derived. 
In some cases, the minimal trellis itself possesses a parallel 
structure. The number of such parallel subtrellises (if any) in 
the minimal trellis is derived. 

A. Preliminaries 

We consider only binary ( N,K,d mm ) linear block codes. 
Let L,M be positive integers such that LM — N . The 
minimal (up to graph isomorphism) L-section trellis, is a 
well understood graphical representation of the code [5], [9]. 
Let the sets of states at the end of each section be denoted 

We define a sequence 
{$0) * * * » s lm} called the state complexity profile (SCP) 

of the trellis and given by = log 2 (|StAf|) for 0 < i < L. 
The minimal L-section trellis of a code C has the property 
that every component of its SCP is less than or equal to 
the corresponding component in the SCP of any other proper 
L-section trellis for C. The maximum among the N + 1 
components in the SCP of the minimal V- section trellis 
(L = N,M = 1) for C is denoted s m&x (C) and we will 
denote the maximum of the components in the SCP of the 
minimal L-section trellis for C as s miXi i(C). For a binary 
iV-tuple v = (vi, • « • ,vn), let Ph,h'[v] denote the ( ft ' - ft)- 
tuple ,v k >) and let ph,h>[C] = {pm'( c ] : c 6 C}. 

Let Ch,h' be the linear subcode of C consisting of all 
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codewords whose components are all zero except for the 
(h f — h) components from the ( h + l)th bit position to the 
h'th bit position. 

In an L-section minimal trellis for a block code, there may 
be a set of parallel branches between two adjacent states. 
In such a case, we call the entire set of parallel branches a 
composite branch. Each composite branch in the zth section 
1 < i < L, is made up of 2 Pz parallel branches where P x 
is the dimension of the subcode denoted G(i_x)A/,iM [9]. In 
the zth section of an X-section trellis for a linear block code 
1 < i < L, the number of distinct branch metrics that have 
to be computed is 2 Dt where D t is the dimension of the 
subcode P(i-x)A/,;M (G) and this number is much less than 
the total number of branches. Di is the rank of the submatrix 
formed by M columns from the [(t — 1)M + l]th to the 
(tM)th column of the generator matrix of the code and is 
upper-bounded by M. For 1 < i < L. let the number of 
composite branches merging into any state s 6 S x m be 2 6lM 
(it is the same for any state in 5^/). For an L-section trellis 
for G, we define the converging branch profile (CBP) as the 
ordered sequence {<5 a/, few? • • ■ T &lmY For 0 < i < L, let 
the number of composite branches emanating from any state 
s € SiM be 2 AiAf , (it is the same for any state in Sim)- The 
ordered sequence {Aq, Ax, • ■ * , A(£_i)m} is called diverging 
branch profile (DBP). Then 8 and A are related as follows: 

biM — S(i-1)M + A(;_x)m “ s iM- (1) 

Based on the theory of L-section trellises [9], it can be shown 
that 

A{i-i)m = dim (C(i_x)jv/,jv) “ dim ~ P% (2) 

which implies that A(t_x)A/ equals the numbers of rows of a 
trellis oriented generator matrix of G whose leading 1 occurs 
among the positions {(z — I )M, (i — 1 )M + 1, • • • , zM — 1} 
and whose span is not contained in the zth section. These 
dimensions can be easily determined from the trellis oriented 
generator matrix of the code [9], [16], [20]. The two sequences, 
* * * ,&lm) {A 0 , Ai, • • ♦ , A(£,_x)m} provide a 

measure of the state connectivity of an L-section minimal 
trellis. In IC implementation of a Viterbi decoder, 6 t M is called 
a radix number. 

B. Parallel Trellises 

Let G be the trellis oriented generator matrix of an (N, K) 
linear block code G [4]. Let r = (n, r 2 , • * ,r N ) be a typical 
row of G. Then, we define the span ofr , denoted span(r), to 
be the smallest interval [i, j], 1 < i < j < N which contains 
all the nonzero elements of r. For a row r whose span is 
[i,j] we also define an active span of r , denoted a$pan(r ), 
as [i,j — 1] if i < j and aspan(r) = <j> if i = j. The trellis 
oriented matrix has the following properties: 1) The leading 
one of every row occurs in an earlier position than the leading 
one of the row below it and 2) The trailing one of every row 
occurs at a different position from the trailing one of every 
other row. Any other trellis oriented matrix for G has the 
same set of row spans although the rows themselves may be 
different [20]. Let T be the minimal TV-section trellis for G . 


Given the trellis oriented generator matrix of a code, the state 
space dimension at any position l is just equal to the number of 
rows whose active span contain / [20]. For example, consider 
the following trellis oriented generator matrix: 

1 1 1 1 0 0 0 0 r x \ 

0 1 0 1 I 0 1 0 r 2 I 

001 11100 r 3 I 

0 0 0 0 1 1 1 1 r 4 / 

for which aspan(ri ) = [1, 3] ,aspan ( 1 * 2 ) = [2,6], 

aspan(rz) = [3,5] and aspan(r 4 ) = [5,7]. For each 
/, 0 < l < 8, counting the number of rows which are 
active at that l yields the state complexity profile (SCP), 
{0, 1,2, 3, 2, 3, 2, 1,0}. For 0 < / < JV, let s t (C) denote 
the dimension of the /th state space of C. Let s max (G) be 
the maximum among the state space dimensions. Define the 
nonempty set 

WC) = »t(C) = W(C7)}. (3) 

Suppose we choose a subcode C' of C such that dim(C / ) = 
dim(C) — 1 and the set of coset representatives [C/C 1 ] is 
generated by the single row r € G. From the above statement 
about $i(C), it is clear that si(C') = si(C) - 1 for exactly 
those / where r is active, i.e., I G aspan(r). For other positions 
l £ aspan(r) we have si(C') = si(C). Hence, we have the 
following proposition. 

Lemma 1: If there exists a row r in the trellis oriented 
generator matrix G for the code C such that aspan(r) D 
/max(G), then we can form a subcode C' of C generated by 
G - {r} such that s maoc (C f ) — Snmx (C) - 1 and /max(G') 3 

/max(G). ■ 

In fact /max (C ) = /max (G) U {/: $l{C) = 5 max (C) — 1. 1 
aspan(r)}. Since G is a trellis-oriented generator matrix, 
G' = G - {r} is also trellis-oriented. We can apply the 
above proposition again to C f if there exists a row r' € G' 
with aspan(r f ) 2 / m ax(G / ). This yields a subcode C with 
dimension smaller by one and s m&x (C) = s rnax (C / ) — 1. If no 
such row r' exist s, the proposition cannot be applied and the 
recursion stops. The above proposition can be generalized. 

Let R(C) be the following subset of rows of G: 

R(C) = {r G G: aspan(r) D / max (G)}. (4) 

Let p = | /2(G) | where |Q| denotes the cardinality of any 
finite set Q. 

Theorem 1: With R(C) defined as above and p = |/2(C)|, 
let 1 < p f < p. There exists a subcode C 7 of C such that 
5max(G') = s m ax(G) - pf and dim(C / ) = dim(G) - p* if and 
only if there exists a subset R ' C R(C) consisting of p f rows 
of R(C) such that for every l satisfying si(C) > s max (G / ), 
there exist at least si(C) - s m ax(G') rows in R l whose active 
spans contain l. The set of coset representatives [ C/C '] is 
generated by R'. 

Proof: Suppose R f = M, * • • ,r' p ,} satisfies the con- 
ditions in the hypothesis. Since R f C R(C) i /max (C) c 
aspan(ri) for 1 < i < p ( . Consider the subcode generated 
by G - R' . For those l € / m ax(G), we can determine s/(G') 
by counting the number of rows r £ (G — R') that are active 
at the position /. But this number is exactly less than s max (C) 
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Fig. 1. Parity check matrix in trellis-oriented form of the ex-BCH (32,21,6) code with an optimum order of bits with respect to trellis state complexity. 


by p'. For l $ I max {C) and satisfying s t (C) > s max (C'), we 
are assured by the hypothesis that si(C) will be reduced by 
at least si(C) - s max (C') thus guaranteeing that s max (C / ) = 
^max(^) P • 

To prove the converse, let C' be a subcode of C whose di- 
mension is dim(C') - p f and satisfying s ma x(C ,/ ) = s max (C) - 
p Without loss of generality, we may let C' be generated by 
G - R! for some subset R' of the trellis-oriented generator 
matrix G of C with |/Z'| = p r . Let T be the minimal 
trellis corresponding to G . Let T' be the minimal trellis for 
C'. Let Ni(R') be the number of rows r' in R * such that 
l £ aspan(r'). Then, at every position /,0 < l < TV, we have 

s l (T) = s l (C') + N l (R f )>s l (C) (5) 

since si(C) is the smallest possible state space dimension. 

Therefore 

N l {K)>si(C)- 8 i(C') 

w(C^. (6) 

For every /, at least si(C) - s max (C f ) rows of R! are 
active. Also, for every l £ / m ax(C'), we have Nt(R') > 
Sm*x(C) — 5 max (C / ) = r'. So all the rows r' £ R! satisfy 
aspan(r') D I max (C). Thus R' C R(C). ■ 

The utility of the above theorem is that it shows how to 
choose a subcode C’ of C with s max (C7 / ) = s ma x(C7) - 
dim([C'/C7 / ]), such that one can build a nonminimal trellis 
T for C with the following properties. 

1) The maximum state space dimension of T is s maLX (C). 

2) T is the union of 2 dim \CiC } parallel isomorphic subtrel- 
lises Ti with each % being isomorphic to the minimal 
trellis for C' . 

3) Upper-bound on parallelism: The smallest such subcode 
has dimension lower-bounded by dim (C) - |#((7)|. i.e., 
the maximum number of parallel subtrellises one can 
obtain with the constraint that the total space dimension 
never exceeds s m&x (C) is upper-bounded by 

with R(C ) as defined above. 

4) Parallelism of the minimal trellis: The logarithm to 
the base two of the number of parallel isomorphic 
subtrellises in a minimal L-section trellis for a binary 
(TV, K) linear block code is given by the number of rows 
in its trellis-oriented generator matrix whose active span 
contains the integers {M,2M, *•*,(£ — 1 )M} where 
TV = LM. 


TABLE I 

Set of Row Spans of Trellis Oriented Generator 
Matrix of (32,21,6) Extended and Permitted BCH Code 


row-# 

span 

row-# 

span 

1 

Ml 

12 

[12,20] 

2 

[2,15] 

13 

[13,20] 

3 

[3,13] 

14 

[14,22] 

4 

[4.14] 

15 

[15,27] 

5 

[5,12] 

16 

[17,24] 

6 

16.18] 

17 

[18,31] 

7 

[7.21] 

IS 

| [19,29] 

8 

[8.25] 

19 

[20,30] 

9 

[9.16] 

20 

[21,28] 

10 

[10,23] 

21 

[25,32] 

11 

[11,19] 




As an example, consider the extended and permuted 
(32,21,6) BCH code. A parity check matrix for this code 
with an optimum order of bits with respect to trellis state 
complexity is shown in Fig. 1. The set of spans of any trellis 
oriented generator matrix for this code is given in Table I. 
The four-section minimal trellis has the SCP {0,7, 9,7, 0} 
giving s maX) 4 (C) = 9. This trellis has two parallel isomorphic 
subtrellises. I max (C) = {16} and it can be verified that 
| #((7)1 = 9. In an attempt to build a trellis consisting 
of 64 parallel subtrellises while satisfying the upper-bound 
of nine on the maximum state space complexity, we let 
p' - 6. So s max (C) - p' = s m 3 kX (C') = 3. The set 
{/: si{C)>s max (C')} = {8,16,24}. However, we find 
that no subset R l of R(C) exists satisfying the conditions 
in Theorem 1. Hence, we cannot build a trellis consisting 
of 64 parallel subtrellises for this code without violating 
the constraint on the maximum state space dimension. If 
we choose p* = 5, then we can find a subset R! = 
{r 6 ,r 7 ,rg,ri 2 ,ri 5 } C R{C) that satisfies all the conditions 
in Theorem 1. Hence, choosing the subcode C' generated by 
G - R l we obtain a trellis T for C consisting of 32 parallel 
isomorphic subtrellises. Each subtrellis is isomorphic to the 
minimal trellis for C" which has s max (C f ) = 4. 

For the same code, the 32-section minimal trellis has 
the SCP that gives s max , 32 (C) = 10 and I m&x (C) ~ 
{12,14,18,20}. Using Table I, we find that \R(C)\ = 
2. In an attempt to build a trellis consisting of four 
parallel subtrellises while satisfying the upper-bound of 
ten on maximum state space dimension, we let p' - 2. 
So s max (<7 ) — 8. The set {/: si(G ) > s max (G )} = 
{10,11,12,13,14,15,16,17,18,19,20,21,22}. We find that 
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the subset of two rows having span [8, 25] and [10], [23] 
satisfy the conditions in Theorem 1. 

This decomposition of a trellis into parallel and structurally 
identical subtrellises of smaller state complexity without cross- 
connections between them has significant advantages for IC 
implementation of Viterbi decoding. Identical Viterbi decoders 
of much simpler complexity can be devised to process the 
subtrellises independently in parallel without internal com- 
munications (or information transfer) between them. Internal 
information transfer limits the decoding speed [28], [30]. 
Furthermore, the number of computations to be carried out 
per subtrellis is much smaller than that of a fully connected 
trellis. As a result, the parallel structure not only simplifies 
the decoding complexity but also speeds up the decoding 
process. For example, the (32,16,8) extended and permuted 
BCH code (also a RM code) has a four-section trellis diagram 
of 64 states. It can be decomposed into eight parallel and 
structurally identical eight-state subtrellises without cross- 
connections between them as shown in Fig. 2. As a result, 
eight identical eight-state Viterbi decoders can be devised to 
process the decoding in parallel. An IC implementation of a 
Viterbi decoder for this code using a 0.8 jum CMOS technology 


has been recently completed at the University of Hawaii VLSI 
Design Center. The decoder is implemented in Xilinx field 
programmable gate array (FPGA) chips [29]. The decoder is 
capable of operating at a speed of 200 Mbps. Custom design of 
this decoder using 0.5 micron CMOS technology can achieve 
a decoding speed of 600 Mbps or higher. 

III. Trellises of BCH and RM Codes 
of Lengths 32, 64 

Based on the theory developed in the previous section, 
an analysis of the parallel structure of the trellises for RM 
and extended binary BCH codes was carried out. The degree 
of parallelism and the state complexity both depend on the 
sectionalization of the trellis. In general, it is known that as 
the number of sections decreases, the state complexity also 
decreases but the branch complexity in each section increases. 
We consider all possible uniform sectionalizations in which 
the number of parallel branches between two connected states 
is at most two. For example, we consider only 64-, 32-, 16- 
and eight-section trellises for the (64,42,8) RM code because 
the four-section trellis has 32 parallel branches between any 
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two adjacent connected states. The reason is that from an 
implementation and computational viewpiont, greater than 
two parallel branches between adjacent connected states are 
disadvantageous. 1 The results of ths analysis are presented 
in Table IV. Some surprising results are observed. A 32- 
section trellis for the (32,11,12) extended BCH code can be 
constructed which consists of 512 parallel 2-state subtrellises. 
An eight-section trellis for the (64,42,8) RM code can be 
constructed which consists of 128 parallel 64- state subtrellises. 
These results do not follow from the squaring construction for 
RM codes or other previously published approaches. They also 
provide the designer a wide range of choices for trellises from 
which to choose. In the tables for each code and for each 
possible choice of the number of sections L, the logarithm 
to base two of the maximum number of parallel subtrellises 
that can be obtained without exceeding the number of states 
in the minimal trellis is denoted P ma x,L- The maximum state 
space dimension of the L-section subtrellis for the subcode C f 
is denoted s max ^(C f ). The best known order of bit positions 
with respect to state complexity of BCH codes of length 64 
presented in [12], [25] was used to produce the tables. 

IV. Issues in the IC Implementation of an 
L-Section Trellis-Based Viterbi Decoder 

In this section, five key factors affecting the decoding speed 
of a Viterbi decoder based on the minimal and nonmini- 
mal trellis are examined. The nonminimal trellis structure 
presented in this paper reduces the internal communication 
and allows independent parallel processing of the subtrellises 
while decreasing the complexity of a Viterbi decoder IC. 
We substantiate this claim through analysis in the following 
subsections. 

A. Effective Computational Complexity of L-Section Trellis 

We consider a Viterbi decoder IC based on an L-section 
trellis with M bits/section for a (LM, K,d m i n ) block code 
C. While many VLSI structures have been described for a 
Viterbi decoder [26], [27], [32], the most widely implemented 
structure is based on ACS where each abstract state in the 
trellis diagram manifests itself as a physical ACS circuit 
on the IC and the same ACS’s are repeatedly used for all 
depths in the trellis. The ACS's can be labeled ACS-z for 
0 < i< 2 Sm ^ L( ‘ C \ 

Let 7 i be the time required to process section-z of the 
trellis. At time t — 0. the metrics of the ACS circuits 
corresponding to the originating state of each parallel subtrellis 
are initialized to zero. After units of time, at t = 71, 
the ACS-z corresponding to state Si at the end of section- 1 
for 0 < i < |Sm(C% has the metric of state Si € Sm * The 
index of the surviving branch into $i is also stored in ACS-z. 
Continuing in this way, at time t = 7l + *** + 7ul < / < 
ACS-z corresponding to state Si,0 < i < \Sim{C)\, will have 
the metric of Si e Sim{C) and a sequence of l survivor 
branch indices corresponding to the most likely path from 

1 When there are exactly two parallel branches with complementary labels, 
the con-elation metric for one branch is the negative of other and hence can 
be obtained by a mere sign inversion. 


the originating state (of the subtrellis to which Si belongs) 
to Si e Sim- 

There are as many ACS’s as the maximum number of states 
at any depth in the L-section trellis for the linear block code. 
In the minimal trellis, whenever the decoder is processing 
the trellis at a depth at which the state size is less than the 
maximum state size, a number of ACS circuits are idle and 
the hardware utilization efficiency is poor. In the nonminimal 
L-section trellis, the utilization of the ACS circuit that exist 
in the IC is improved. Since all the subtrellis decoders operate 
independently in parallel, from the standpoint of speed, the 
effective computational complexity of decoding a single block 
(a received vector) is defined as the computational complexity 
of a single parallel subtrellis (viz. the minimal trellis for the 
subcode C') plus the cost of the final comparison among the 
choices (survivors) presented by each of the subtrellises. The 
time required for the final comparison is small relative to the 
time required for decoding a subtrellis and this comparison 
can be pipelined. Since subtrellises are processed in parallel, 
the speed of operation is limited only by the time required to 
process a subtrellis. 

Note that both the minimal and nonminimal trellises require 
the same number of ACS circuits. However, the nonminimal 
trellis has a larger number of parallel subtrellises as compared 
to the minimal trellis (which often has none). Hence decoding 
using the nonminimal trellis with proper structure is faster 
compared to that using the minimal trellis. Therefore, a system 
bit rate specification which earlier could be met only by 
the use of some P number of Viterbi decoders operating 
simultaneously in parallel can be met with much fewer than P 
Viterbi decoders. In this manner, the effective computational 
complexity is a factor affecting the reduction in hardware 
complexity of an overall decoder. 

B. Complexity of the ACS Circuit 

The CBP defined as the number of branches merging into 
a state at each particular depth also affects decoding speed 
and implementation complexity. This is called radix in IC 
literature. Let 8 lM {C), 1 < i < L, be the CBP of the minimal 
trellis for C with trellis oriented generator matrix G. At depth 
/. 1 < / < L, the ACS circuits have to perform at least Sim 
stages of a tree type [33] two-way comparisons to find the best 
incoming branch. Hence reduction of the converging branch 
profile will improve the speed of decoding and reduce the 
complexity of each ACS circuit. We now show that none of the 
components in the CBP of the nonminimal trellis is increased. 
As will be shown by examples in Section V, most of the 
components of the CBP are decreased considerably. 

Consider a nonminimal trellis for C obtained as the union 
of two parallel subtrellises each isomorphic to the minimal 
trellis for C\ a subcode of C generated by G - {r},r 6 G. 
Let SiM (C')> 1 < i < L, be the CBP of the minimal trellis 
for C'. Recall that si(C') = s/(G) if l £ aspan(r) and 
si(C f ) = 8i{C) - 1, if l € aspan(r). By (2), A^_i )m (G / ) € 
{A(j_ i)a^(C 7), A( i _ 1 ) A /(G) - 1}. By (1), 5»jw(G') >6i\{(C) 
only if S(i_!)Af (C') - and $ iM (Cf) = S{m(C) - 

1. But in this case, (i-l)M £ aspan(r) and iM € aspan(r). 
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So = A {i _i)Af(C0 - 1- Therefore, S iM {C) = 

C. Traceback Complexity 

Consider the problem of traceback to determine the best 
path through the trellis. In the minimal trellis, the A CS-i 
corresponding to state s, 6 Sim has to store 6 x m{C) + 
Pi(C) bits in order to identify which of the 2 6%M ^ com- 
posite branches merging into Si and which of the 2 Pl ^ 
parallel branches that form a compoosite branch survives. 
Therefore, in the minimal trellis, each ACS-z needs to store 
££_ 1 + Pi{C)) = dim(C) bits in order to identify 

sequence of surviving incoming branches. In the nonminimal 
trellis, the storage in number of bits required for each ACS- 
i is Ef =1 (. 6 iM {C f ) + Pi(C')) = dim(C') where C' is the 
subcode of C corresponding to the subtrellis. Since dim(C / ) = 
dim(C') - Pmax,L(C'), the ACS’s in the nonminimal trellis 
design require less storage than in the minimal trellis. The 
combined savings in storage in all the ACS circuits 

is significant. 

D. ACS - Connectivity 

The basic operations performed by an ACS circuit are: ad- 
dition of branch metrics of the incoming branches to the state 
metrics of the corresponding originating states, comparison of 
the resulting sums to find the best, selection of the surviving 
sum as the new state metric and the corresponding surviving 
branch label. The ACS-array architecture is dominated by 
the area required by the interconnections to transfer the state 
metrics [27]. For a state $i £ SW,0 < i < 2 Si ^\0 < l < L, 
let Ai(si) denote the set of states in S(/+i)m ^ at are adjacent 
to S{. Let Af(St) = <f> if i > |Sja/|. Then in the ACS-array 
implementation of the Viterbi decoder based on the minimal 
trellis, a path to transfer the state metric must exist between 
ACS-i and all ACS circuits that correspond to states in 

A 0 {si) U Ai(si) U • • • U A(£,_!)($t). 

The above set defines the connectivity of ACS-z in the ACS 
array corresponding to state s x € Sim ■ The connectivity of the 
ACS’s corresponding to states in the minimal trellis results 
in a large amount of area in the VLSI chip being used for 
wiring [26], [27]. On the contrary, in the implementation of 
a Viterbi decoder based on the nonminimal trellis, the ACS 
circuits can be divided into blocks [31] such that the ACS’s 
corresponding to states in a single subtrellis form a block. A 
particular ACS-z needs to transfer its metric only to a subset 
of ACS’s within its own block. This reduced connectivity 
results in a reduction of hardware complexity and wiring area. 
The maximum connectivity of ACS-z is upper-bounded by 
s m ax,r(C0 in the nonminimal trellis implementation. 

E. Branch Complexity 

The number of distinct branch metrics that have to be 
computed in section-z of the trellis is a property of the code 
and is unaltered by the parallelization of the trellis. Most IC 
decoders have a branch metric computational unit where all the 


branch metrics are calculated and then transferred to the ACS 
circuits [26], [27]. Because of the interconnection of branches 
between states in the trellis, routing the branch metrics to each 
of the ACS circuits requires a large amount of chip area. The 
trellises we describe show improvement over the minimal In- 
sertion trellis on this count because each subtrellis requires 
only a subset of the set of branch metrics in section-z of the 
trellis. 

Parallelization of the minimal trellis as described in Section 
II may lead to a larger number of total computations being 
performed in decoding. The number of emanating branches 
in section-(/ + 1) is which may be larger than 

the corresponding product for the minimal trellis for some 
values of 1,0 < 1<L. However, as explained above, the 
hardware complexity of the decoder is not affected. We 
illustrate the reason with an example: The RM (64,42,8) 
code has a minimal trellis with the s\m + Mm sequence 
of {7,13,16,16,16,16,13,7}. The same sequence for the 
nonminimal trellis is {13, 16, 16, 16, 16, 16, 16, 13} which is 
larger at positions {0, 1,6, 7}. Consider the case when l = 1 
(other cases are similar). In section 2 of the minimal trellis, 
each of the 128 ACS’s corresponding to states at the end of 
section 1 has 64 branches emanating from it. In section 2 of the 
nonminimal trellis, each of the 8192 ACS’s has eight branches 
emanating from it. Hence, the number of operations performed 
per ACS are fewer in the nonminimal trellis. Hence larger 
values of |Sim|2 A/m represent larger number of operations 
performed simultaneously in parallel by all the ACS’s in the 
nonminimal trellis. 

V. Examples 

Consider the (32,21,6) extended and permuted BCH 
code. The minimal four-section, 8-bits/section trellis has 
SCP {0,7, 9, 7,0}. A nonminimal trellis four-section trellis 
can be obtained as the union of 32 parallel isomorphic 
subtrellises each having SCP {0,4, 4, 4,0}. Thus, Viterbi 
decoder implementations using the ACS-array architecture 
for both trellises will require 512 ACS circuits. However, in 
the minimal trellis, each ACS will require the capability of 
choosing the best among 64 incoming branches whereas the 
corresponding number is only 16 in the nonminimal trellis. 
The problem of routing metrics is also much reduced since the 
connectivity of ACS-0 is 128 and that of ACS-z is at least 64 
for 1 < i < 511 while the maximum connectivity of any ACS 
in the nonminimal trellis is only 16. The structural parameters 
of each of these trellises are summarized in Tables II and m. 

Assuming each real number to be quantized to 8-bits the 
VLSI layouts of a radix-8 ACS and a radix- 16 ACS were 
generated. A modified form of the bit-level pipelined ACS 
architecture [33] was used for the ACS’s. The area required 
for the radix-16 ACS was 2.7 times that required for the 
radix-8 ACS. Assuming a factor of 2.5 increase in area per 
doubling of the radix, we see that 128 ACS’s in the minimal 
trellis have an area 6.25 times larger than their counterparts in 
the nonminimal trellis implementation. The remaining ACS’s 
require the same area. We see that the device area is reduced 
by adopting the proposed trellis architecture. Furthermore, the 
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TABLE II 

Parameters of Four-Section Trellis of 
(32,21,6) Extended and Permuted BCH Code 


i 

0 

T| 

~2| 

Tl 

4 

SCP 

0 

7 

9 

7 

0 

CBP 

- 

0 

4 

6 

7 

EBP 

T 

Jl 

4 

0 



i -ACS-# 

Connectivity of ACS-t 

0 

12S 

1 - 511 

64 


TABLE III 

Parameters of Four-Section Trellis of (32,16) Subcode 
of the (32,21,6) Extended and Permuted BCH Code 


i 

0 

1 

2 

3 

4 

SCP 

0 

4 

4 

4 

0 

CBP 

- 

0 

4 

4 

4 

EBP 

4 

4 

: 4 

T 



i =ACS-# 

! Connectivity of ACS-t 

0-511 

16 


reduction in ACS-connectivity will yield significant reduction 
in wiring area. The savings in hardware complexity and 
increase in speed due to the nonminimal trellis approach easily 
overcomes the extra cost of the final comparison among the 
32 choices (one from each of the subtrellises) to find the best 
codeword. 


VI. Trellis for a (64,40,8) Subcode of RM (64,42,8) 

A (64,40,8) subcode of the RM (64,42,8) code is proposed to 
NASA for usage as inner code in a concatenated coding system 
with the NASA standard (255,223,33) Reed-Solomon code as 
outer code [31]. This RM subcode achieves a 5.3 dB coding 
gain over uncoded binary phase shift keying (BPSK) at the bit- 
error rate (BER) of 10“ 6 . The required speed of decoding is 
960 x 10 6 BPSK symbols/s which translates to an information 
bit rate of 600 Mbps. The coding gain is 0.5 dB less than the 
coding gain of a similar scheme with the same outer code but 
the NASA standard rate- 1/2, 64-state convolutional code [34] 
as the inner code. However the (64,40) RM subcode has a 
higher rate of 0.626 b/symbol than that of the convolutional 
code and thus requires lesser bandwidth. More significant is 
the fact that a Viterbi decoder for the (64,40,8) inner code 
can be designed to operate at higher data rates than that for 
the convolutional code using the parallelism of the trellis of 
the RM subcode. The trellis for the NASA standard 64-state 
convolutional code does not consist of parallel subtrellises. 

Let C denote the RM (64,42,8) code and C a (64,40) sub- 
code of C . If the L-section trellis on which decoding is based 
is composed of a union of P parallel isomorphic subtrellises 
then, the effective computational complexity denoted A e ff(L) 
is merely that of a single subtrellis plus the cost of obtaining 
the final decision by comparing outputs of each of the P 
Viterbi decoders. The value of L which minimizes A e ff(L) 
with the constraint that the L-section trellis T have a maximum 


TABLE IV 

Maximum Parallelization of Trellises for 
all RM and BCH Codes of Length 32,64 


No of Sections L 

64 

32 

16 

8 

4 

2 

1 

RM(32,6,1G) 

Pm**x(r) 


4 

4 

4 

3 

4 

•SmW,(C') 


1 

1 

1 

1 

0 

2 

DCH(32,1 1,12) 

/W(T) 


9 

9 

9 

7 


) 


1 

1 

1 

2 


3 

RM(32,16,8) 



5 

4 

5 

3 


-Sm »x ( L(C') 


4 

4 

3 

3 


4 

BCII(32,21,6) 

Pm»x,L(T) 


2 

4 

4 

5 


$m*jrx(C f ) 


8 

6 

6 

4 


5 

RM(32,26,4) 

P buuc,l(7~) 


1 

1 

2 



SnwwX(C') 


4 

4 

3 



6 

RM(64,7,32) 

JW(T) 

5 

5 

5 

5 

4 

5 

W.f.tC') 

1 

1 

1 

1 

1 

0 

7 

RM(G4,10,28) 

p»,«x(r) 

10 

10 

10 

10 

10 

10 

$mi»x,Z.(C') 

0 

0 

0 

0 

0 

0 

8 

BCII(64, 16,24) 

Pn^LiT) 

14 

14 

14 

12 

13 

14 


1 

1 

1 

2 

1 

0 

9 

BCII(64,18,22) 

*W(T) 

16 

16 

16 

14 

16 

16 


1 

1 

1 

2 

2 

2 

10 

RM(64,22,16) 

P,n**x(T) 

9 

9 

8 

9 

6 



f 5 

5 

5 

4 

4 


II 

BCII(f>4 ,24,16) 

P.n«.t(T) 

n 

11 

' 10 

11 

8 


5| naxX(C') 

5 

5 

5 

4 

4 


12 

BCII{61,30,14) 

Pnmx.L(^) 

15 

13 

14 

11 

14 


) 

6 

7 

G 

7 

4 


13 

BCH(G4,3C,12) 

P.nnxX(P) 

10 

9 

10 

9 

8 



10 

10 

9 

8 

8 


14 

BCII(64 ,39,10) 

P.naxXfT') 

7 

8 

9 

10 

11 


-*‘ t maxX(C / ) 

13 

12 

11 

9 

8 


15 

RM( 6*1,42,8) 

P»,«x(T) 

5 

C 

5 

7 



>Snax,L( I-' ) 

9 

8 

S 

6 



16 

11011(64,45,8) 

P^.l(T) 

2 

3 

4 

4 




ax./AC') 

12 

11 

10 

9 



17 

BC1I (64 ,5 LG) 


1 

1 

1 

2 





11 

11 

11 

10 



18 

RM( 64,57,4) 


] 

1 

1 

2 





5 

5 

5 

4 




state complexity not greater than s max ^(C) [which is different 
from $(C)] is determined. Note that $ maXy L{C) is a function 
of the choice of the subcode and we will choose that subcode 
which has the least s m &x,z,(C) for each L. The complexity of 
each addition, subtraction and comparison is assumed to be 
equal to one addition equivalent operation. 

In the following, the trellis diagrams of various section- 
alizations for this RM subcode are given. Their effective 
computational complexities are computed. 

A. L = 4, M = 16 

Let C 0 = (16, 15, 2), Ci = (16, 11,4), and C 2 = (16,5,8) 
be the corresponding RM codes, Gi a generator matrix of C* 
and Gi/j a generator matrix for the set of coset representatives 
[Ci/Cj]. Let x denote the Kronecker product. For L = 4, the 
RM (64,42) code has a minimal trellis corresponding to the 
2-level squaring construction with a state complexity profile 
(SCP) {0, 10, 10, 10,0} (s max ^(C) = 10) and trellis-oriented 
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Fig. 3. An eight-section, 64-state subtrellis for the (64,35,8) subcode of the (64,40,8) RM subcode. 


Sequence for Decoding 



Fig. 4. Sequence for decoding using concurrent bidirectional execution sequence. 


generator matrix 


G={ 1 1 1 l)®G 0/ i 

/I 1 0 0\ 

+ 0 1 1 0 1 ®Gi 

\0 0 1 1 / 

/I 0 0 0\ 

. I 0 1 o 0 j 
+ ooio ® G2 
Vo 0 0 1 / 


In order to obtain a (64,40) subcode C, one can delete any 
two of the 64 rows above giving a generator matrix for C. The 
maximum state space complexity s mAX ^(C) of the resulting 
code depends on which two rows we delete. It is easy to see 
that in order to have the least s max ^{C) which equals 8 we 
must delete any two of the four rows among (1111) 0 G 0 /i 
obtainng an SCP of {0,8, 8, 8,0} (s m ax, 4 (G) = 4). Using 
the theory developed earlier, it can be seen that we can 
obtain at most four parallel subtrellises in any four-section 
trellis for C without exceeding the allowable s ma x ,4 of eight. 
The effective computational complexity may be computed to 
give A e ff(4) = 39682 addition equivalent operations for the 
four-section trellis. 


B. L = 8, M = 8 

Let Co = (8, 8,1), Ci = (8,7,2) C 2 = (8,4,4) C 3 = 
(8,1,8) be RM codes. For L = 8, the RM (64,42,8) code 


has a minimal trellis (with two parallel subtrellises) corre- 
sponding to the 3-level squaring construction with a SCP 
{0,7,10,13,10,13,10,7,0} (smax^CO = 13 ) with 
oriented generator matrix 



The (64,40) subcode C with the best SCP is obtained 
by deleting the rows 0 Go/\ one amcmg 

the three rows r\ 0 Gi/ 2 * This c °d e C has SCP 
{0,6,8,11,8,11,8,6,0} (s max , s (C) = 11). Repeating a 
similar analysis, it is seen that one can obtain at most 
32 parallel subtrellises in any eight-section trellis for C 
without exceeding the maximum allowable state space 
complexity of s m ax,s(G) = 11. Each subtrellis has the 
SCP {0,6, 6, 6, 3, 6, 6, 6,0} and from knowledge of its 
trellis structure the effective complexity is A e ff(8) — 12822 
addition equivalent operations. 
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CODEWO RD 


Fig. 5. Block diagram of overall decoder with 32 Viterbi decoders. 


C. L = 16, AT = 4 

Let C 0 = = (4,4,1),C 2 = (4,3,2) C 3 = 

(4,1,4) C 4 = (4,0, oo ) be RM codes. For L = 16, the 
RM (64,42,8) code has a trellis oriented generator matrix 
given by 

<2 = <2rM( 16,5,8) ® ®l/2 + <?RM(16,11,4) ® ^2/3 

+ Gr M (16,13,2) ® ^3/4 (9) 

where GRM(n t Jk,d) denotes a trellis oriented generator matrix 
for the (n, k,d) RM code. For L = 16, the RM (64,42,8) 
code has a minimal trellis (with no parallel subtrellises) 
corresponding to the four-level squaring construction with a 
SCP {0, 4, 7, 10, 10, 13, 13, 13, 10, 13, 13, 13, 10, 10, 7, 4, 0} 
(s m ax,i6(<?) = 13). The (64,40) subcode C with the best 
SCP is generated by G = G - {r\ <g> G 1 / 2 ,r\ ® G 1/ 2 } 
where r} and r\ are the two rows with span [2], [15] 
and [3], [14] in the trellis oriented generator matrix for 
RM (16,11,4). The SCP of the minimal trellis for C is 
{0,4,6,8,8,11,11,11,8,11,11,11,8,8,6,4,0} (s maXfl6 (C) = 
11). By analysis, one can obtain at most 8 parallel subtrellises 
in any 16-section trellis for C without exceeding the 
allowable s max ^e(C) of 11. Each subtrellis has SCP 
{0,4, 6, 8, 6, 8,8,8, 5, 8,8, 8,6, 8, 6,4,0}. The resulting 


effective computational complexity is A e ff(16) = 23174 
addition equivalent operations. 

D. L = 32, M = 2 and L = 64, M = 1 

When L = 32, s max , 32 (C r ) = 12. The maximum number 
of parallel isomorphic subtrellises possible without exceeding 
the allowable s max ^2 (C) = 12 in any 32-section trellis for 
the (64,40) subcode C is at most 4. So A e fr(32) > 37476. 
When L = 64, s mAX}G4 (C) — 12. Furthermore, no paral- 
lel subtrellises are possible without exceeding the allowable 
^ max ,64 (C?) = 12. Hence A e ff(64) = 198000. 

From the above analysis, we see that the eight-section trellis 
for the (64,40) RM subcode results in the least effective 
complexity. A VLSI implementation of a high-speed decoder 
for the (64,40) RM subcode is under way. The decoder is 
based on the eight-section trellis which is a union of 32 
parallel isomorphic subtrellises with a maximum of 64-states 
each. A schematic of the subtrellis is shown in Fig. 3. Note 
that the last four sections of the subtrellis form a mirror 
image of the first four sections. This structure allows us to 
perform bidirectional decoding from both ends of the subtrellis 
simultaneously [10], [31], [35]. Sections one through four and 
sections eight through five (in reverse order) are processed 
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at the same time and path information corresponding to the 
most likely paths into the center eight states which are the 
destination states are stored. The two path metrics (one from 
each side) at a center state are then added. This gives path 
metrics of eight final survivors and the path with the largest 
path metric is the most likely path through the subtrellis. 
Since the resolution is done at the center of the subtrellis, 
the bottleneck of decoding caused by the large radix at the 
center states is avoided. This bidirectional decoding can be 
achieved by either using two identical subtrellis decoders 
working from both directions or using only one decoder to 
process the subtrellis in a concurrent bidirectional execution 
sequence as shown in Fig. 4. The second approach simply 
exploits the use of pipelining in the ACS implementation and 
the mirror symmetry of the subtrellis about the center axis. 
The bidirectional decoding results in advantages in speed and 
implementation. A block diagram for the overall decoder is 
shown in Fig. 5. We further note that sections two, three, 
four, five, six, and seven of each subtrellis decompose into 
eight parallel, eight-state, fully connected isomorphic sub- 
subtrellises as depicted in Fig. 3. This fact can be used to 
further reduce implementation complexity and increase the 
decoding speed. 

VII. Conclusion 

We have presented an approach for decomposing the min- 
imal trellis of a binary linear block code into a nonminimal 
trellis composed of parallel components. This approach allows 
parallel processing of the subtrellises and does not increase 
the maximum number of states. Hence, it has significant 
speed advantage. In addition, it also reduces the IC area 
requirements. Given a linear block code, we have estimated the 
limits to the benefits of this approach and its dependence on the 
uniform sectionalization of the trellis. The branch complexity 
of the nonminimal trellis relative to the minimal trellis can 
be larger in some sections. However, this does not increase 
the hardware complexity. Since the application of this method 
depends only on the generator matrix of the code, it can be 
applied to arbitrary linear block codes. 
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